Start node determination for tree traversal for shadow rays in graphics processing

ABSTRACT

At least one processor may organize a plurality of primitives of a scene in a hierarchical data structure, wherein a plurality of bounding volumes are associated with a plurality of nodes of the hierarchical data structure. The at least one processor may rasterize a representation of each of the plurality of bounding volumes to an off-screen render target in the memory. The at least one processor may determine, based at least in part on a pixel in the off-screen render target that maps to a ray in the scene, a non-root node of the hierarchical data structure associated with the pixel as a start node to start traversal of the hierarchical data structure. The at least one processor may traverse the hierarchical data structure starting from the start node to determine whether the ray in the scene intersects one of the plurality of primitives.

TECHNICAL FIELD

This disclosure relates to graphics processing, including traversing ahierarchical data structure to determine a ray-primitive intersectionfor shadow ray tracing.

BACKGROUND

In computer graphics, shadow rendering is a technique in which shadowsare added to a three-dimensional (3D) scene based on whether particularlocations of the scene are illuminated by a light source. A graphicsprocessing unit (GPU) may perform such shadow rendering for a particularlocation of the 3D scene by emanating a vector called a shadow ray fromthe location towards the light source. If the GPU determines that theshadow ray intersects a primitive in the scene geometry, the GPU maydetermine that the source location is in shadow and is not illuminatedby the light source.

In order to accelerate the process of finding shadow ray-primitiveintersections, the GPU may arrange the scene geometry of the 3D scene inan acceleration data structure (ADS) that hierarchically groups sceneprimitives (e.g., triangles). The GPU may recursively traverse the ADSby performing shadow ray intersection tests on the hierarchy of sceneprimitives to determine whether the shadow ray intersects a primitivesof the scene. If the GPU determines that the shadow ray emanating from aparticular location intersects a primitive based on the traversal of theADS, the GPU may determine that the particular location is occluded fromthe light source by at least the primitive.

SUMMARY

In one aspect, the disclosure is directed to a method. The methodincludes organizing, by at least one processor, a plurality ofprimitives of a scene in a hierarchical data structure, wherein aplurality of bounding volumes are associated with a plurality of nodesof the hierarchical data structure. The method further includesrasterizing, by the at least one processor, a representation of each ofthe plurality of bounding volumes to an off-screen render target. Themethod further includes determining, by the at least one processor andbased at least in part on a pixel that intersects a first ray in theoff-screen render target, a non-root node of the hierarchical datastructure associated with the pixel as a start node to start traversalof the hierarchical data structure. The method further includestraversing, by the at least one processor, the hierarchical datastructure starting from the start node to determine whether a second rayin the scene intersects one of the plurality of primitives.

In another aspect, the disclosure is directed to an apparatus configuredto process graphics data. The apparatus includes a memory. The apparatusfurther includes at least one processor configured to: organize aplurality of primitives of a scene in a hierarchical data structure,wherein a plurality of bounding volumes are associated with a pluralityof nodes of the hierarchical data structure; rasterize a representationof each of the plurality of bounding volumes to an off-screen rendertarget in the memory; determine, based at least in part on a pixel thatintersects a first ray in the off-screen render target, a non-root nodeof the hierarchical data structure associated with the pixel as a startnode to start traversal of the hierarchical data structure; and traversethe hierarchical data structure starting from the start node todetermine whether a second ray in the scene intersects one of theplurality of primitives.

In another aspect, the disclosure is directed to an apparatus. Theapparatus includes means for organizing a plurality of primitives of ascene in a hierarchical data structure, wherein a plurality of boundingvolumes are associated with a plurality of nodes of the hierarchicaldata structure. The apparatus further includes means for rasterizing arepresentation of each of the plurality of bounding volumes to anoff-screen render target. The apparatus further includes means fordetermining, based at least in part on a pixel that intersects a firstray in the off-screen render target, a non-root node of the hierarchicaldata structure associated with the pixel as a start node to starttraversal of the hierarchical data structure. The apparatus furtherincludes means for traversing the hierarchical data structure startingfrom the start node to determine whether a second ray in the sceneintersects one of the plurality of primitives.

In another aspect, the disclosure is directed to a computer-readablestorage medium storing instructions. The instructions, when executed,cause one or more programmable processor to: organize a plurality ofprimitives of a scene in a hierarchical data structure, wherein aplurality of bounding volumes are associated with a plurality of nodesof the hierarchical data structure; rasterize a representation of eachof the plurality of bounding volumes to an off-screen render target inthe memory; determine, based at least in part on a pixel that intersectsa first ray in the off-screen render target, a non-root node of thehierarchical data structure associated with the pixel as a start node tostart traversal of the hierarchical data structure; and traverse thehierarchical data structure starting from the start node to determinewhether a second ray in the scene intersects one of the plurality ofprimitives.

The details of one or more aspects of the disclosure are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages of the disclosure will be apparent from thedescription and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example computing device thatmay be configured to implement one or more aspects of this disclosure.

FIG. 2 is a block diagram illustrating example implementations of theCPU, the GPU, and the system memory of FIG. 1 in further detail.

FIG. 3 is a conceptual diagram illustrating an example graphics sceneonto which the GPU may perform shadow ray tracing and an examplepartitioning of the graphics scene into bounding volumes.

FIG. 4 illustrates an example hierarchical data structure having nodesthat are associated with the example bounding volumes and primitivesshown in FIG. 3.

FIG. 5 is a block diagram illustrating an example representation ofbounding volumes rasterized to an example off-screen render target.

FIG. 6 is a flowchart illustrating an example process for determiningthe start node for traversing an example hierarchical tree structure tofind a shadow ray-primitive intersection.

DETAILED DESCRIPTION

In general, this disclosure describes techniques for a GPU to moreefficiently perform shadow rendering for a graphics scene by determiningwhether shadow rays directed towards the light source of the sceneintersect primitives that are arranged in a hierarchical data structuresuch as an ADS. Instead of traversing the hierarchical data structurefrom the root node, the GPU may instead start traversal of thehierarchical data structure from a node other than the root node,thereby reducing the number of ray intersection tests that the GPU mayperform. The GPU may determine a non-root node of the hierarchical datastructure from which to start traversal by utilizing shader units fromits graphics processing pipeline to rasterize a subset of the boundingvolumes associated with interior nodes and with leaf nodes of thehierarchical data structure to an off-screen render target. The GPU maydetermine, from the off-screen render target, an interior non-root nodefrom which to start traversal of the hierarchical data structure,thereby reducing the number of shadow ray intersection tests a GPU mayperform to traverse the hierarchical data structure. Because rasterizingto an off-screen render target is relatively less computationallyexpensive than performing ray intersection tests, the GPU may realize asubstantial increase in shadow rendering performance by rasterizing tothe off-screen render target and determining an interior non-root nodefrom which to start traversal of the hierarchical data structurecompared with traversing the hierarchical data structure from the rootnode.

In accordance with aspects of the present disclosure, the GPU mayorganize a plurality of primitives of a scene in a hierarchical datastructure, wherein a plurality of bounding volumes are associated with aplurality of nodes of the hierarchical data structure. The GPU mayfurther rasterize a representation of each of the plurality of boundingvolumes to an off-screen render target in the memory. The GPU mayfurther determine, based at least in part on a pixel that intersects afirst ray in the off-screen render target, a non-root node of thehierarchical data structure associated with the pixel as a start node tostart traversal of the hierarchical data structure. The GPU may furthertraverse the hierarchical data structure starting from the start node todetermine whether a second ray in the scene intersects one of theplurality of primitives.

FIG. 1 is a block diagram illustrating an example computing device thatmay be configured to implement one or more aspects of this disclosure.As shown in FIG. 1, device 2 may be a computing device including but notlimited to video devices, media players, set-top boxes, wirelesshandsets such as mobile telephones and so-called smartphones, personaldigital assistants (PDAs), desktop computers, laptop computers, gamingconsoles, video conferencing units, tablet computing devices, homeappliances, industrial appliances, kiosks, and the like. In the exampleof FIG. 1, device 2 may include central processing unit (CPU) 6, systemmemory 10, and GPU 12. Device 2 may also include display processor 14,transceiver module 3, user interface 4, and display 8. Transceivermodule 3 and display processor 14 may both be part of the sameintegrated circuit (IC) as CPU 6 and/or GPU 12, may both be external tothe IC or ICs that include CPU 6 and/or GPU 12, or may be formed in theIC that is external to the IC that includes CPU 6 and/or GPU 12.

Device 2 may include additional modules or units not shown in FIG. 1 forpurposes of clarity. For example, device 2 may include a speaker and amicrophone, neither of which are shown in FIG. 1, to effectuatetelephonic communications in examples where device 2 is a mobilewireless telephone, or a speaker where device 2 is a media player.Device 2 may also include a video camera. Furthermore, the variousmodules and units shown in device 2 may not be necessary in everyexample of device 2. For example, user interface 4 and display 8 may beexternal to device 2 in examples where device 2 is a desktop computer orother device that is equipped to interface with an external userinterface or display.

Examples of user interface 4 include, but are not limited to, atrackball, a mouse, a keyboard, and other types of input devices. Userinterface 4 may also be a touch screen and may be incorporated as a partof display 8. Transceiver module 3 may include circuitry to allowwireless or wired communication between device 2 and another device or anetwork. Transceiver module 3 may include modulators, demodulators,amplifiers and other such circuitry for wired or wireless communication.

CPU 6 may be a microprocessor, such as a central processing unit (CPU)configured to process instructions of a computer program for execution.CPU 6 may comprise a general-purpose or a special-purpose processor thatcontrols operation of device 2. A user may provide input to device 2 tocause CPU 6 to execute one or more software applications. The softwareapplications that execute on CPU 6 may include, for example, anoperating system, a word processor application, an email application, aspreadsheet application, a media player application, a video gameapplication, a graphical user interface application or another program.Additionally, CPU 6 may execute GPU driver 22 for controlling theoperation of GPU 12. The user may provide input to device 2 via one ormore input devices (not shown) such as a keyboard, a mouse, amicrophone, a touch pad or another input device that is coupled todevice 2 via user interface 4.

The software applications that execute on CPU 6 may include one or moregraphics rendering instructions that instruct CPU 6 to cause therendering of graphics data to display 8. In some examples, the softwareapplication instructions may conform to a graphics applicationprogramming interface (API), such as, e.g., an Open Graphics Library(OpenGL®) API, an Open Graphics Library Embedded Systems (OpenGL ES)API, a Direct3D API, an X3D API, a RenderMan API, a WebGL API, or anyother public or proprietary standard graphics API. In order to processthe graphics rendering instructions, CPU 6 may issue one or moregraphics rendering commands to GPU 12 (e.g., through GPU driver 22) tocause GPU 12 to perform some or all of the rendering of the graphicsdata. In some examples, the graphics data to be rendered may include alist of graphics primitives, e.g., points, lines, triangles,quadrilaterals, triangle strips, etc.

GPU 12 may be configured to perform graphics operations to render one ormore graphics primitives to display 8. Thus, when one of the softwareapplications executing on CPU 6 requires graphics processing, CPU 6 mayprovide graphics commands and graphics data to GPU 12 for rendering todisplay 8. The graphics data may include, e.g., drawing commands, stateinformation, primitive information, texture information, etc. GPU 12may, in some instances, be built with a highly-parallel structure thatprovides more efficient processing of complex graphic-related operationsthan CPU 6. For example, GPU 12 may include a plurality of processingelements, such as shader units, that are configured to operate onmultiple vertices or pixels in a parallel manner. The highly parallelnature of GPU 12 may, in some instances, allow GPU 12 to draw graphicsimages (e.g., GUIs and two-dimensional (2D) and/or three-dimensional(3D) graphics scenes) onto display 8 more quickly than drawing thescenes directly to display 8 using CPU 6.

GPU 12 may, in some instances, be integrated into a motherboard ofdevice 2. In other instances, GPU 12 may be present on a graphics cardthat is installed in a port in the motherboard of device 2 or may beotherwise incorporated within a peripheral device configured tointeroperate with device 2. GPU 12 may include one or more processors,such as one or more microprocessors, application specific integratedcircuits (ASICs), field programmable gate arrays (FPGAs), digital signalprocessors (DSPs), or other equivalent integrated or discrete logiccircuitry. GPU 12 may also include one or more processor cores, so thatGPU 12 may be referred to as a multi-core processor.

GPU 12 may be directly coupled to graphics memory 40. Thus, GPU 12 mayread data from and write data to graphics memory 40 without using a bus.In other words, GPU 12 may process data locally using a local storage,i.e., graphics memory 40, instead of off-chip memory. Such graphicsmemory 40 may be referred to as on-chip memory. This allows GPU 12 tooperate in a more efficient manner by eliminating the need of GPU 12 toread and write data via a bus, which may experience heavy bus traffic.In some instances, however, GPU 12 may not include a separate memory,but instead utilize system memory 10 via a bus. Graphics memory 40 mayinclude one or more volatile or non-volatile memories or storagedevices, such as, e.g., random access memory (RAM), static RAM (SRAM),dynamic RAM (DRAM), erasable programmable ROM (EPROM), electricallyerasable programmable ROM (EEPROM), Flash memory, a magnetic data mediaor an optical storage media.

In some examples, GPU 12 may store a fully formed image in system memory10. Display processor 14 may retrieve the image from system memory 10and output values that cause the pixels of display 8 to illuminate todisplay the image. Display 8 may be the display of device 2 thatdisplays the image content generated by GPU 12. Display 8 may be aliquid crystal display (LCD), an organic light emitting diode display(OLED), a cathode ray tube (CRT) display, a plasma display, or anothertype of display device.

In accordance with aspects of the present disclosure, GPU 12 mayorganize a plurality of primitives of a scene in a hierarchical datastructure, wherein a plurality of bounding volumes are associated with aplurality of nodes of the hierarchical data structure. GPU 12 mayfurther rasterize a representation of each of the plurality of boundingvolumes to an off-screen render target in the memory. GPU 12 may furtherdetermine, based at least in part on a pixel that intersects a first rayin the off-screen render target, a non-root node of the hierarchicaldata structure associated with the pixel as a start node to starttraversal of the hierarchical data structure. GPU 12 may furthertraverse the hierarchical data structure starting from the start node todetermine whether a second ray in the scene intersects one of theplurality of primitives.

FIG. 2 is a block diagram illustrating example implementations of CPU 6,GPU 12, and system memory 10 of FIG. 1 in further detail. As shown inFIG. 2, CPU 6 may include at least one software application 18, graphicsAPI 20, and GPU driver 22, each of which may be one or more softwareapplications or services that execute on CPU 6.

Memory available to CPU 6 and GPU 12 may include system memory 10 andframe buffer 16. Frame buffer 16 may be a part of system memory 10 ormay be separate from system memory 10, and may store rendered imagedata.

Software application 18 may be any application that utilizes thefunctionality of GPU 12. For example, software application 18 may be aGUI application, an operating system, a portable mapping application, acomputer-aided design program for engineering or artistic applications,a video game application, or another type of software application thatuses 2D or 3D graphics.

Software application 18 may include one or more drawing instructionsthat instruct GPU 12 to render a graphical user interface (GUI) and/or agraphics scene. For example, the drawing instructions may includeinstructions that define a set of one or more graphics primitives to berendered by GPU 12. In some examples, the drawing instructions may,collectively, define all or part of a plurality of windowing surfacesused in a GUI. In additional examples, the drawing instructions may,collectively, define all or part of a graphics scene that includes oneor more graphics objects within a model space or world space defined bythe application.

Software application 18 may invoke GPU driver 22, via graphics API 20,to issue one or more commands to GPU 12 for rendering one or moregraphics primitives into displayable graphics images. For example,software application 18 may invoke GPU driver 22, via graphics API 20,to provide primitive definitions to GPU 12. In some instances, theprimitive definitions may be provided to GPU 12 in the form of a list ofdrawing primitives, e.g., triangles, rectangles, triangle fans, trianglestrips, etc.

The primitive definitions provided to GPU 12 may include vertexspecifications that specify one or more vertices associated with theprimitives to be rendered. The vertex specifications may includepositional coordinates for each vertex and, in some instances, otherattributes associated with the vertex, such as, e.g., color coordinates,normal vectors, and texture coordinates. The primitive definitions mayalso include primitive type information (e.g., triangle, rectangle,triangle fan, triangle strip, etc.), scaling information, rotationinformation, and the like.

Based on the instructions issued by software application 18 to GPUdriver 22, GPU driver 22 may formulate one or more commands that specifyone or more operations for GPU 12 to perform in order to render theprimitive. When GPU 12 receives a command from CPU 6, processor cluster46 may execute a graphics processing pipeline to decode the command andmay configure the graphics processing pipeline to perform the operationspecified in the command. For example, a command engine of the graphicsprocessing pipeline may read primitive data and assemble the data intoprimitives for use by the other graphics pipeline stages in the graphicsprocessing pipeline. After performing the specified operations, GPU 12outputs the rendered data to frame buffer 16 associated with a displaydevice.

Frame buffer 16 stores destination pixels for GPU 12. Each destinationpixel may be associated with a unique screen pixel location. In someexamples, frame buffer 16 may store color components and a destinationalpha value for each destination pixel. For example, frame buffer 16 maystore Red, Green, Blue, Alpha (RGBA) components for each pixel where the“RGB” components correspond to color values and the “A” componentcorresponds to a destination alpha value. Frame buffer 16 may also storedepth values for each destination pixel. In this way, frame buffer 16may be said to store graphics data (e.g., a surface). Although framebuffer 16 and system memory 10 are illustrated as being separate memoryunits, in other examples, frame buffer 16 may be part of system memory10. Once GPU 12 has rendered all of the pixels of a frame into framebuffer 16, frame buffer may output the finished frame to display 8 fordisplay.

Processor cluster 46 may include one or more programmable processingunits 42 and/or one or more fixed function processing units 44.Programmable processing unit 42 may include, for example, programmableshader units that are configured to execute one or more shader programsthat are downloaded onto GPU 12 from CPU 6. In some examples,programmable processing units 42 may be referred to as “shaderprocessors” or “unified shaders,” and may perform geometry, vertex,pixel, or other shading operations to render graphics. The shader unitsmay each include one or more components for fetching and decodingoperations, one or more ALUs for carrying out arithmetic calculations,one or more memories, caches, and registers.

GPU 12 may designate programmable processing units 42 to perform avariety of shading operations such as vertex shading, hull shading,domain shading, geometry shading, fragment shading, and the like bysending commands to programmable processing units 42 to execute one ormore of a vertex shader stage, tessellation stages, a geometry shaderstage, a rasterization stage, and a fragment shader stage in thegraphics processing pipeline. In some examples, GPU driver 22 may causea compiler executing on CPU 6 to compile one or more shader programs,and to download the compiled shader programs onto programmableprocessing units 42 contained within GPU 12.

The shader programs may be written in a high level shading language,such as, e.g., an OpenGL Shading Language (GLSL), a High Level ShadingLanguage (HLSL), a C for Graphics (Cg) shading language, an OpenCL Ckernel, etc. The compiled shader programs may include one or moreinstructions that control the operation of programmable processing units42 within GPU 12. For example, the shader programs may include vertexshader programs that may be executed by programmable processing units 42to perform the functions of the vertex shader stage, tessellation shaderprograms that may be executed by programmable processing units 42 toperform the functions of the tessellation stages, geometry shaderprograms that may be executed by programmable processing units 42 toperform the functions of the geometry shader stage and/or fragmentshader programs that may be executed by programmable processing units 42to perform the functions of the fragment shader stage. A vertex shaderprogram may control the execution of a programmable vertex shader unitor a unified shader unit, and include instructions that specify one ormore per-vertex operations.

Processor cluster 46 may also include fixed function processing units44. Fixed function processing units 44 may include hardware logiccircuitry that is hard-wired to perform certain functions. Althoughfixed function processing units 44 may be configurable, via one or morecontrol signals for example, to perform different functions, the fixedfunction hardware typically does not include a program memory that iscapable of receiving user-compiled programs. In some examples, fixedfunction processing units 44 in processor cluster 46 may include, forexample, processing units that perform raster operations, such as, e.g.,depth testing, scissors testing, alpha blending, low resolution depthtesting, etc., to perform the functions of the rasterization stage ofthe graphics processing pipeline.

Graphics memory 40 may be on-chip storage or memory that is physicallyintegrated into the integrated circuit of GPU 12. Because graphicsmemory 40 is on-chip, GPU 12 may be able to read values from or writevalues to graphics memory 40 more quickly than reading values from orwriting values to system memory 10 via a system bus.

In accordance with aspects of the present disclosure, processor cluster46 may perform operations as discussed above to execute a graphicsprocessing pipeline to render a three-dimensional (3D) graphics scenethat includes one or more graphics objects within a model space or worldspace, including rendering a plurality of primitives that make up theone or more graphics objects in the 3D scene. Processor cluster 46 mayalso perform ray tracing of the 3D graphics scene by tracing a path oflight from a light source through pixels of the 3D graphics scene, todetermine which pixels of the 3D graphics scene are illuminated by thelight source.

As part of performing ray tracing of the 3D graphics scene, processorcluster 46 may perform shadow rendering of the 3D graphics scene todetermine surfaces of the 3D graphics scene that are not illuminated bythe light source (and therefore are in shadows). Such surfaces may be inshadows because one or more other solid surfaces block light raysemitted by the light source from reaching those surfaces. To determinewhether a particular location in the 3D graphics scene is shaded fromthe light source by a surface, processor cluster 46 may cast a vectorcalled a shadow ray from the particular location in the direction of thelight source. If processor cluster 46 determines that the shadow raycast from the location intersects a primitive that is situated betweenthe location and the light source, then processor cluster 46 may deemthe location from which the shadow ray originates to be occluded fromthe light source.

To determine whether a particular shadow ray originating from aparticular location of the 3D graphics scene and directed towards alight source for the 3D graphics scene intersects a primitive in the 3Dgraphics scene, GPU 12 may organize the primitives in the 3D graphicsscene into a hierarchical structure, such as acceleration data structure(ADS) 41, that hierarchically groups scene primitives (e.g., triangles).GPU 12 may store ADS 41 in graphics memory 40, system memory 10, inshader memory (not shown) of processor cluster 46, or in sharedsystem/graphics memory (not shown). Details of how GPU 12 uses ADS 41 todetermine shadow ray-primitive intersections are discussed in furtherdetail with respect to FIGS. 3 and 4.

FIG. 3 is a conceptual diagram illustrating an example graphics sceneonto which GPU 12 may perform shadow ray tracing and an examplepartitioning of the graphics scene into bounding volumes. As shown inFIG. 3, graphics scene 50 may be a 2D or 3D graphics scene that includesprimitives 52A-52E (hereafter “primitives 52”). As part of the shadowmapping process, GPU 12 may determine, for a particular location ingraphics scene 50, whether a shadow ray originating from the particularlocation towards a light source intersects one of primitives 52. If GPU12 determines that the shadow ray intersects a primitive that issituated between the light source and the location from which the shadowray originates, then the location from which the shadow ray originatesis shadowed from the light source by the intersected primitive and istherefore not illuminated by the light source.

GPU 12 may systematically determine whether a primitive in primitives 52intersects a particular shadow ray by dividing graphics scene 50,hierarchically arranging the divided portions of graphics scene 50, andrecursively traversing the hierarchy of the divided portions of graphicsscene 50. GPU 12 may conceptually partition primitives 52 into boundingvolumes 56A-56E (“bounding volumes 56”). Bounding volumes 56 may beaxis-aligned bounding boxes (AABBs), which may be bounding boxes havinga minimized area within which all points of the enclosed primitives maylie. The bounding boxes may be axis-aligned such that the edges of theboxes may be parallel to the coordinate axis (e.g., the x, y, and zaxis).

Bounding volume 56A may be a bounding box that bounds all primitives 52of graphics scene 50. Bounding volumes 56B and 56C may be subsets ofbounding volume 56A in that bounding volumes 56B and 56C bound a subsetof the portion of scene 50 bound by bounding volume 56A. Bounding volume56B may bind primitives 52A and 52B, and bounding volume 56C may bindprimitives 52C, 52D, and 52E. Bounding volumes 56D and 56E may besubsets of bounding volume 56C, and may bind a subset of the portion ofscene 50 bound by bounding volume 56B. Bounding volume 56D may boundprimitives 52C and 52D, and bounding volume 56E may bound primitive 52E.

In the example shown in FIG. 3, GPU 12 may partition primitives 52 intofive bounding volumes 56A-56E. GPU 12 is not limited to five boundingvolumes 56A-56E but may, depending on the scene and the number ofprimitives in the scene, use more than or fewer than five boundingvolumes 56A-56E. In some examples GPU 12 may create additional boundingvolumes as subsets of bounding volume 56B to individually boundprimitives 52A and 52B, respectively. In some examples, CPU 6 may alsobe configured to partition primitives 52 into bounding volumes 56.

Bounding volumes 56 may be arranged into a hierarchical structure suchthat GPU 12 may traverse the hierarchical structure to determinepossible shadow ray-primitive intersections. FIG. 4 illustrates anexample hierarchical data structure having nodes that are associatedwith the bounding volumes 56 and primitives 52 shown in FIG. 3. Asdiscussed above, scene primitives of a scene may be organized into ahierarchical structure such as ADS 41, and GPU 12 may traverse ADS 41 todetermine possible shadow ray-primitive intersections. As shown in FIG.4, one example of ADS 41 may be a bounding volume hierarchy (BVH) tree60 in which nodes 62A-62E (“nodes 62”) of BVH tree 60 associated withbounding volumes 56 and primitives 52 of graphics scene 50 arehierarchically arranged into a tree-like structure.

Specifically, GPU 12 may arrange BVH tree 60 such that a node associatedwith a bounding volume that encloses another bounding volume may be aparent node of the node associated with the enclosed bounding volume. Inthe example of FIG. 3, because bounding volume 56C encloses boundingvolumes 56D and 56E, which are subsets of bounding volume 56C, node 62Cassociated with bounding volume 56C may be a parent node of nodes 62Dand 62E associated with bounding volumes 56D and 56E, respectively.Therefore, root node 62A may be associated with bounding volume 56,interior node 62C may be associated with bounding volume 56C, and leafnodes 62B, 62D, and 62E may be associated with bounding volumes 56B,56D, and 56E, respectively.

Nodes of BVH tree 60 other than root node 62A may be referred to asnon-root nodes of BVH tree 60. For example, interior node 62C and leafnodes 62B, 62D, and 62E may be referred to as non-root nodes of BVH tree60. Leaf nodes 62B, 62D, and 62E may each be linked with at least oneprimitive of primitives 52. For example, leaf node 62B may be linkedwith primitives 52A and 52B because bounding volume 56B associated withleaf node 62D encloses primitives 52A and 52B, leaf node 62D may belinked with primitives 52C and 52D because bounding volume 56Dassociated with leaf node 62D encloses primitives 52C and 52D, and leafnode 62E may be linked with primitive 52E because bounding volume 56Eassociated with leaf node 62E encloses primitive 52E. BVH tree 60 may beconsidered an unbalanced binary tree because each non-leaf node ofhierarchical data structure 60 has at most two child nodes, and becauseleaf nodes 62B, 62D, and 62E may have unequal depths.

GPU 12 may traverse BVH tree 60 to determine whether a shadow rayintersects a primitive of primitive 52 by performing ray-boxintersection tests for the bounding volumes 56 associated with nodes 62of BVH tree 60. GPU 12 may start traversal of BVH tree 60 by performinga ray-box intersection test for bounding volume 56A associated with rootnode 62A. If GPU 12 determines that the shadow ray intersects boundedvolume 56A, GPU 12 may continue to traverse BVH tree 60 to node 62B, andGPU 12 may perform a ray-box intersection test for bounding volume 56Bassociated with node 62B.

If GPU 12 determines that the shadow ray does not intersect boundedvolume 56B, GPU 12 may recursively traverse BVH tree up to node 62A anddown to node 62C, and GPU 12 may perform a ray-box intersection test forbounding volume 56C associated with node 62C. If GPU 12 determines thatthe shadow ray intersects bounded volume 56C, GPU 12 may continue totraverse BVH tree 60 to node 62D, and GPU 12 may perform a ray-boxintersection test for bounding volume 56D associated with node 62D. IfGPU determines that the shadow ray intersects bounded volume 56D, GPU 12may perform a ray-primitive intersection tests for primitives linked tonode 62D.

Therefore, GPU 12 may perform a ray-primitive intersection test forprimitive 52C and may also perform a ray-primitive intersection test forprimitive 52D to determine whether the shadow ray intersects primitive52C or primitive 52D. GPU 12 may determine from the ray-primitiveintersection test for primitive 52D that the shadow ray does intersectprimitive 52D. Upon determining that the shadow ray does intersect aprimitive (e.g., primitive 52D), GPU 12 may determine the location ingraphics scene 50 from which the shadow ray originates is occluded fromthe light source.

If GPU 12 determines that the shadow ray does not intersect primitive52D, GPU 12 may continue to recursively traverse BVH tree 60 up to node62C and down to 62E, and GPU 12 may perform a ray-box intersection testfor bounding volume 56E associated with node 62E. GPU 12 may determine,based on the ray-box intersection test, whether the shadow rayintersects bounding volume 56E and, upon making the determination, mayend traversal of BVH tree 60 for the shadow ray.

If BVH tree 60 can be traversed starting from a non-root node, such asone of interior node 62C or leaf nodes 62B, 62D, and 62E, GPU 12 mayreduce the number of ray intersection tests that it performs relative tostarting traversal of BVH tree 60 from root node 62A, thereby increasingthe efficiency of determining a primitive that is intersected by a ray.GPU 12 may start traversal of BVH tree 60 from a non-root node of BVHtree 60 by determining that a bounded volume associated with a non-root(interior) node of BVH tree 60 is intersected by the particular shadowray. GPU 12 may rasterize at least a subset of bounded volumes 56 to anoff-screen render target in graphics memory 40. GPU 12 may determine,based on rasterizing the plurality of bounding volumes 56 to theoff-screen render target, a non-root node of BVH tree 60 as a start nodein BVH tree 60 to start traversal of BVH tree 60. GPU 12 may traverseBVH tree 60 starting from the start node to determine the primitive thatis intersected by ray 54.

To perform shadow rendering of a graphics scene, such as graphics scene50, using shadow rays, GPU 12 may render a representation of at least aportion of bounding volumes 56 of BVH tree 60 from the perspective of alight source for graphics scene 50. When representations of at least aportion of bounding volumes 56 are rendered from the perspective of thelight source, GPU 12 can determine that locations in graphics scene 50are not illuminated by the light source if shadow rays that originatefrom those locations intersect a primitive of primitives 52.

As discussed above, GPU 12 may traverse a hierarchical structure, suchas BVH tree 60, to determine whether a shadow ray originating from aparticular location within graphics scene 50 intersects with a primitiveof primitives 52. If GPU 12 determines, via traversal of BVH tree 60,that the shadow ray intersects with a primitive of primitive 52, GPU 12may determine that the particular location within graphics scene 50 fromwhich the shadow ray originates is not illuminated by the light sourcefor graphics scene 50. GPU 12 may typically traverse BVH tree 60 fromroot node 62A by performing ray-box intersection tests between theshadow ray and bounding volumes 56 associated with nodes 52 and/orray-intersection tests between the shadow ray and primitives 52 todetermine whether the particular location is illuminated by the lightsource for scene 50.

However, BVH trees can be many multiple levels deep. For example, if aBVH tree includes 16 levels, GPU 12 may be able to more efficientlydetermine whether a shadow ray intersects a primitive by startingtraversal of the BVH tree from a non-root node of the BVH tree insteadof starting from the root of the BVH tree. Because GPU 12 may use pixelshader programs and/or vertex shader programs running on processorcluster 46 to quickly rasterize pixels to an off-screen render targetand to quickly sample pixels in the off-screen render target, GPU 12 maytake advantage of the performance characteristics of these shaderprograms to determine non-root nodes from which to start traversal of aBVH tree.

FIG. 5 is a block diagram illustrating an example representation ofbounding volumes rasterized to an example off-screen render target. GPU12 may, for a specified number of top levels of a BVH tree, use a pixelshader and/or a vertex shader of its graphics processing pipelineexecuting on processor cluster 46 to rasterize representations ofbounding volumes associated with the nodes of a BVH tree to anoff-screen render target in graphics memory or system memory 10 from alight source's point of view. GPU 12 may transform the representationsof bounding volumes with a projection matrix such that GPU 12 rasterizesthe representations of bounding volumes from a particular light sourceof graphics scene 50's point of view.

If graphics scene 50 includes multiple light sources, GPU 12 may performsuch a rasterization of representations of bounding volumes to theoff-screen render target multiple times, once for each light source. GPU12 may perform such rasterization with a different projection matrix foreach light source, such that representations of bounding volumes arerasterizes from the point of view of each light source.

In some examples, GPU 12 may rasterize only specified top levels of theBVH tree to the off-screen render target, such that given a 16-level BVHtree, GPU 12 may rasterize a representation of bounding volumesassociated with only the top, e.g., 5-6, levels of the BVH tree. Thus,GPU 12 may, in some examples, rasterize representations of boundingvolumes associated with fewer than all of the non-root nodes of the BVHtree to off-screen render target 64.

As shown in FIG. 5, GPU 12 may rasterize representations 66A-66D(“representations 66”) of bounding volumes 56B-56E of graphics scene 50associated with nodes 62B-62E of BVH tree 60 to off-screen render target64 from the perspective of a light source for graphics scene 50. GPU 12may rasterize representations 66 with a perspective matrix. GPU 12 maystore off-screen render target 64 in graphics memory 40, system memory10, or any other suitable memory.

GPU 12 may project bounding volumes 56B-56E and shader units ofprocessor cluster 46 of GPU 12 may rasterize representations of boundingvolumes 56B-56E associated with the non-root nodes 62B-62E of BVH tree60 as two-dimensional or three-dimensional representations, such aspolygons, cubes, and the like. For example, a hardware rasterizer of GPU12 may scan-covert each of bounding volumes 56 into pixels in rendertarget 64. In one example, GPU 12 may rasterize a plurality of flatshaded cubes to off-screen render target 64 as representations 66 ofbounding volumes 56B-56E. GPU 12 may also scale and translate each ofthe representations 66 via a perspective matrix such thatrepresentations 66 are rasterized from a light source's point of view.In some examples, GPU 12 may rasterize representations 66 of boundingvolumes 56B-56E in relatively lower resolution compared to boundingvolumes 56B-56E in graphics scene 50. In this way, GPU 12 may furtherincrease its performance in determining ray-primitive intersections.

GPU 12 may associate a different color value with each of the nodes 62of BVH tree 60, and may, for each bounding volume of bounding volumes56, rasterize, as the associated representation of the bounding volumeof representations 66, a block of pixels having a color value associatedwith a node of BVH tree 60 that is associated with the respectivebounding volume. In this way, GPU 12 may rasterize each of therepresentations 66 of bounding volumes 56B-56E in a different color, sothat the color of each of the representations 66 may represent a nodeindex that indicates the associated node in BVH tree 60.

GPU 12 may determine the color of the representations 66 by performingstandard depth testing of the projected bounding volumes 56B-56E and byassigning different color values to pixels of representations 66 torepresent different depths of the projected bounding volumes 56B-56E.GPU 12 may associate the color values assigned to the pixels ofrepresentations 66 with nodes 62 of BVH tree 60. In this way, GPU 12 maydetermine a node in BVH tree 60 associated with a pixel in render target64 by sampling the color value of the pixel. As part of rasterizingrepresentations 66 to render target 64, GPU 12 may determine a mappingof shadow rays to pixels in render target 64, so that, for a pixel inrender target 64, GPU 12 may map it as possibly intersecting one or moreshadow rays, such as shadow ray 70A that originates from location 72A orshadow ray 70B that originates from position 72B. In some examples,render target 64 may have a one-to-one mapping between a pixel and ashadow ray. In other examples, if representations 66 are rasterized at arelatively lower resolution (compared to the resolution of correspondingbounding volumes 56) to render target 64, a pixel may be mapped tomultiple shadow rays.

To determine potential shadow ray-primitive intersections for aparticular shadow ray that originates from a particular location in agraphics scene, GPU 12 may, for each ray, determine a pixel location inrender target 64 that map to the shadow ray. Given a shadow ray havingan origin and a direction, GPU 12 may, based on the light source'sprojection matrix, map the shadow ray to a pixel location in rendertarget 64. For that pixel location, GPU 12 may sample the color value ofthe pixel and determine the node associated with the sampled color valueas the start node to start traversal of BVH tree 60.

In the example of FIG. 5, GPU 12 may determine that pixel 76A is a pixelin render target 64 that is mapped to shadow ray 70A. GPU 12 may samplepixel 76A in render target 64 to determine the color value of pixel 76A,determine that the sampled color value of pixel 76A is the same as thecolor value associated with a node index for node 62B, and may therebyassociate that pixel 76A with node 62B. Thus, GPU 12 may set node 62B asthe start node for traversing BVH tree 60 and may start traversal of BVHtree 60 from node 62B.

As GPU 12 traverses BVH tree 60 starting from node 62B, GPU 12 may firstperform a shadow ray-box intersection test for bounding volume 56Bassociated with node 62B. If GPU 12 determines that shadow ray 70Aintersects bounded volume 56B, GPU 12 may perform a ray-primitiveintersection test for primitive 52A that is linked to node 62D. If GPU12 determines that shadow ray 70A does not intersect primitive 52A, GPU12 may recursively traverse BVH tree 60 up to node 62B and may perform aray-primitive intersection test for primitive 52B linked to node 62B.GPU 12 may determine from the ray-primitive intersection test forprimitive 52B that shadow ray 70A does not intersect primitive 52B.Because GPU 12 starts traversal of BVH tree 60 from interior node 62Band determines shadow ray 70A does not intersect with either primitives52A or 52B, GPU 12 may end the traversal of BVH tree 60 for shadow ray70A. GPU 12 may therefore determine that location 72A is the locationfrom which shadow ray 70A is illuminated by the light source (and notoccluded from the light source) for graphics scene 50 because shadow ray70A does not intersect any primitives in scene 50. As can be seen, thetraversal of BVH tree 60 to determine whether a primitive intersectsshadow ray 70A may include performing relatively fewer ray-boxintersection tests as opposed to the case in which GPU 12 is required totraverse BVH tree 60 starting from the root node.

In some examples, depending on the viewing angle, representations ofbounding volumes of any two nodes may overlap when projected on screen.In this case the traversal of BVH tree 60 may start from the lowestcommon ancestor of the two overlapping bounding volumes, which may notbe the root node. If two representations of bounding volumes overlap,and if a ray intersects in the overlapped region of the two boundingvolumes, GPU 12 may determine the lowest common ancestor node of thenodes associated with the bounding volumes and may start traversal ofBVH tree 60 from the lowest common ancestor node of the nodes associatedwith the bounding volumes represented by the two overlappingrepresentations. For example, while rasterizing representations 66 torender target 64, GPU 12 may determine that representation 66C andrepresentation 66D overlap in area 69, where representation 66C isassociated with node 62D and representation 66D is associated with node62E, and where node 62D and node 62E are at the same level in BVH tree60. GPU 12 may determine the color value associated with the lowestcommon ancestor node of nodes 62D and 62E and may set the color value ofthe pixels in area 69 (i.e., the region of overlap) to the same colorvalue associated with the lowest common ancestor node of nodes 62D and62E. In this example, GPU 12 may determine that node 62C is the lowestcommon ancestor node of nodes 62D and 62E and may set the color value ofthe pixels in area 69 to the color value of representation 66B that isassociated with node 62C. If a particular ray maps to a pixel locationwithin area 69, GPU may start traversal of BVH tree 60 from node 62C,and not root node 62A, to determine any possible ray-primitiveintersections.

For example, if GPU 12 determines that shadow ray 70A emanating fromlocation 72B maps to pixel 76B that lies within area 69, GPU 12 maydetermine to start traversal of BVH tree 60 from the node 62C, which isnot root node 62A, by sampling the color of mapped pixel 76B anddetermining that the color value of the sample pixel is the same as thecolor value associated with node 62C. Because node 62C is associatedwith bounding volume 56C represented by representation 66B in rendertarget 64, GPU 12 may perform a ray-box intersection test for boundingvolume 56C associated with node 62C.

If GPU 12 determines that shadow ray 70B intersects bounded volume 56C,GPU 12 may traverse BVH tree 60 to node 62D. GPU 12 may perform aray-box intersection test for bounding volume 56D associated with node62D. If GPU determines that shadow ray 70B intersects bounded volume56D, GPU 12 may perform a ray-primitive intersection test for primitive52C linked to node 62D. If GPU determines that shadow ray 70B does notintersect primitive 52C, GPU 12 may recursively traverse BVH tree 60 upto node 62D and may perform a ray-primitive intersection test forprimitive 52D linked to node 62D. GPU 12 may determine from theray-primitive intersection test for primitive 52D that shadow ray 70Bdoes intersect primitive 52D.

When GPU 12 determines that shadow ray 70B intersects one of primitives52, GPU 12 may determine that location 72B from which shadow ray 70B isoccluded from (not illuminated by) the light source of graphics scene 50and may end the traversal of BVH tree 60 for shadow ray 70B. As can beseen, GPU 12 may accelerate the traversal of BVH tree 60 by rendering arepresentation of graphics scene 50 into render target 64, and samplingpixels from render target 64 to determine a non-root node from which tobegin traversal of BVH tree 60.

FIG. 6 is a flowchart illustrating an example process for determiningthe start node for traversing an example hierarchical tree structure tofind a shadow ray-primitive intersection. As shown in FIG. 6, theprocess may include organizing, by at least one processor, such as CPU 6or GPU 12, a plurality of primitives 52 of a graphics scene (e.g.,graphics scene 50) in a hierarchical data structure (e.g., BVH tree 60),wherein a plurality of bounding volumes 56 are associated nodes 62 ofthe hierarchical data structure (702). The process may further includerasterizing, by CPU 6 or GPU 12, representations of each of theplurality of bounding volumes 56 to an off-screen render target 64(704). The process may further include determining, by CPU 6 or GPU 12and based at least in part on a pixel in the off-screen render target 64that maps to a ray in the graphics scene 50, a non-root node of thehierarchical data structure associated with the pixel as a start node tostart traversal of the hierarchical data structure (706). The processmay further include traversing, by CPU 6 or GPU 12, a set of nodes ofthe hierarchical data structure starting from the start node todetermine whether the ray in the graphics scene 50 intersects one of theplurality of primitives 52 (718).

The at least one processor may perform shadow rendering for a pluralityof locations in graphics scene 50 by emanating a shadow ray from each ofthe plurality of locations, and determining possible shadowray-primitive intersections for each of the shadow rays according to theabove-mentioned process. If the at least one processor determines that aparticular location in graphics scene 50 is occluded from the lightsource because the shadow ray emanating from the particular locationintersects a primitive in graphics scene 50, the at least one processormay modify the pixel values of that particular location. For example,the at least one processor may modify the color values of the particularlocation to a black color value or another suitable color value thatindicates the particular location is occluded from the light source.

Further, by rasterizing representations of each of the plurality ofbounding volumes 56 to an off-screen render target 64, and determining anon-root node of the hierarchical data structure from which to starttraversal of the hierarchical data structure based on the pixel in theoff-screen render target 64 that maps to the shadow ray, the processprovides a technological solution to an underlying technological problemin graphics processing of how to more efficiently traverse ahierarchical data structure by determining a non-root start node,thereby enabling the at least one processor to more efficiently performshadow rendering for a scene.

In some examples, rasterizing the representation of each of theplurality of bounding volumes 56 to the off-screen render target 64 mayfurther include associating, by CPU 6 or GPU 12, a different one of aplurality of color values with each of the plurality of nodes 62 of thehierarchical data structure, and for each bounding volume of theplurality of bounding volumes 56, rasterizing, by CPU 6 or GPU 12, ablock of pixels having one of the different color values associated withone of the nodes of the hierarchical data structure that is associatedwith the respective bounding volume.

In some examples, determining the non-root node of the hierarchical datastructure associated with the pixel as the start node to start traversalof the hierarchical data structure may further include determining, byCPU 6 or GPU 12, a node of the hierarchical data structure that isassociated with a pixel color value of the pixel, and setting, by CPU 6or GPU 12, the node of the hierarchical data structure that isassociated with the pixel color value as the start node to starttraversal of the hierarchical data structure.

In some examples, the process may further include determining, with CPU6 or GPU 12, that a first representation of a first bounding volume ofthe bounding volumes overlaps a second representation of a secondbounding volume of the bounding volumes, wherein the first boundingvolume is associated with a first node of the hierarchical datastructure and the second bounding volume is associated with a secondnode of the hierarchical data structure, and setting, by CPU 6 or GPU12, color values of one or more pixels in a region of overlap of thefirst representation and the second representation to a node color valueassociated with a lowest common ancestor node of the first node and thesecond node.

In some examples, the process may further include, responsive todetermining that the ray intersects one of the plurality of primitives52, determining, by CPU 6 or GPU 12, that a location in the graphicsscene 50 from which the ray emanates towards a light source is occludedfrom the light source, where the ray comprises a shadow ray. In someexamples, rasterizing the representation of each of the plurality ofbounding volumes 56 to the off-screen render target 64 may furtherinclude rasterizing, by CPU 6 or GPU 12, representations of theplurality of bounding volumes 56 associated with fewer than all of thenon-root nodes of the hierarchical data structure to the off-screenrender target 64.

In some examples, rasterizing the representation of each of theplurality of bounding volumes 56 to the off-screen render target 64 mayfurther include rasterizing, by CPU 6 or GPU 12, a plurality offlat-shaded cubes to the off-screen render target 64 as therepresentations of each of the plurality of bounding volumes 56, andscaling and translating, by CPU 6 or GPU 12, each of the plurality offlat-shaded cubes to match a shape of a respective bounding volume. Insome examples, the process may further include rendering the scene fordisplay by a display device.

In one or more examples, the functions described may be implemented inhardware, software, firmware, or any combination thereof. If implementedin software, the functions may be stored on or transmitted over as oneor more instructions or code on a computer-readable medium.Computer-readable media may include computer data storage media orcommunication media including any medium that facilitates transfer of acomputer program from one place to another. Data storage media may beany available media that can be accessed by one or more computers or oneor more processors to retrieve instructions, code and/or data structuresfor implementation of the techniques described in this disclosure. Byway of example, and not limitation, such computer-readable media cancomprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage,magnetic disk storage or other magnetic storage devices, or any othermedium that can be used to carry or store desired program code in theform of instructions or data structures and that can be accessed by acomputer. Also, any connection is properly termed a computer-readablemedium. For example, if the software is transmitted from a website,server, or other remote source using a coaxial cable, fiber optic cable,twisted pair, digital subscriber line (DSL), or wireless technologiessuch as infrared, radio, and microwave, then the coaxial cable, fiberoptic cable, twisted pair, DSL, or wireless technologies such asinfrared, radio, and microwave are included in the definition of medium.Disk and disc, as used herein, includes compact disc (CD), laser disc,optical disc, digital versatile disc (DVD), floppy disk and Blu-ray discwhere disks usually reproduce data magnetically, while discs reproducedata optically with lasers. Combinations of the above should also beincluded within the scope of computer-readable media.

The code may be executed by one or more processors, such as one or moredigital signal processors (DSPs), general purpose microprocessors,application specific integrated circuits (ASICs), field programmablelogic arrays (FPGAs), or other equivalent integrated or discrete logiccircuitry. Accordingly, the term “processor” and “processing unit,” asused herein may refer to any of the foregoing structure or any otherstructure suitable for implementation of the techniques describedherein. In addition, in some aspects, the functionality described hereinmay be provided within dedicated hardware and/or software modulesconfigured for encoding and decoding, or incorporated in a combinedcodec. Also, the techniques could be fully implemented in one or morecircuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless handset, an integratedcircuit (IC) or a set of ICs (i.e., a chip set). Various components,modules or units are described in this disclosure to emphasizefunctional aspects of devices configured to perform the disclosedtechniques, but do not necessarily require realization by differenthardware units. Rather, as described above, various units may becombined in a codec hardware unit or provided by a collection ofinteroperative hardware units, including one or more processors asdescribed above, in conjunction with suitable software and/or firmware.Various examples have been described. These and other examples arewithin the scope of the following claims.

The invention claimed is:
 1. A method comprising: organizing, by atleast one processor, a plurality of primitives of a scene in ahierarchical data structure, wherein a plurality of bounding volumes areassociated with nodes of the hierarchical data structure; rasterizing,by the at least one processor, representations of each of the pluralityof bounding volumes to an off-screen render target, wherein eachrepresentation of the representations of each of the plurality ofbounding volumes is associated with a different color value of aplurality of color values; determining, by the at least one processorand based at least in part on a color value of the plurality of colorvalues of a pixel of a representation of a respective one of thebounding volumes in the off-screen render target that maps to a ray, anon-root node of the hierarchical data structure associated with therespective one of the bounding volumes as a start node to starttraversal of the hierarchical data structure; and traversing, by the atleast one processor, a set of nodes of the hierarchical data structurestarting from the start node to determine whether the ray in the sceneintersects one of the plurality of primitives.
 2. The method of claim 1,wherein rasterizing the representation of each of the plurality ofbounding volumes to the off-screen render target further comprises:associating, by the at least one processor, a different one of aplurality of color values with each of the plurality of nodes of thehierarchical data structure; and for each bounding volume of theplurality of bounding volumes, rasterizing, by the at least oneprocessor, a block of pixels having one of the different color valuesassociated with one of the nodes of the hierarchical data structure thatis associated with the respective bounding volume.
 3. The method ofclaim 2, wherein determining the non-root node of the hierarchical datastructure associated with the bounding volume as the start node to starttraversal of the hierarchical data structure further comprises:determining, by the at least one processor, one of the nodes of thehierarchical data structure that is associated with the color value ofthe pixel; and setting, by the at least one processor, the node of thehierarchical data structure that is associated with the color value ofthe pixel as the start node to start traversal of the hierarchical datastructure.
 4. The method of claim 2, further comprising: determining, bythe at least one processor, that a first representation of a firstbounding volume of the bounding volumes overlaps a second representationof a second bounding volume of the bounding volumes, wherein the firstbounding volume is associated with a first node of the nodes of thehierarchical data structure and the second bounding volume is associatedwith a second node of the nodes of the hierarchical data structure; andsetting, by the at least one processor, color values of one or morepixels in a region of overlap of the first representation and the secondrepresentation to a node color value associated with a lowest commonancestor node of the first node and the second node.
 5. The method ofclaim 1, further comprising: responsive to determining that the rayintersects one of the plurality of primitives, determining, by the atleast one processor, that a location in the scene from which the rayemanates towards a light source is occluded from the light source,wherein the ray comprises a shadow ray.
 6. The method of claim 1,wherein rasterizing the representation of each of the plurality ofbounding volumes to the off-screen render target further comprises:rasterizing, by the at least one processor, the representation of eachof the plurality of bounding volumes to the off-screen render targetfrom a point of view of a light source.
 7. The method of claim 1,wherein rasterizing the representation of each of the plurality ofbounding volumes to the off-screen render target further comprises:rasterizing, by the at least one processor, a plurality of flat-shadedcubes to the off-screen render target as the representations of each ofthe plurality of bounding volumes; and scaling and translating, by theat least one processor, each of the plurality of flat-shaded cubes tomatch a shape of a respective bounding volume.
 8. The method of claim 1,further comprising: rendering, by the at least one processor, the scenefor display by a display device.
 9. An apparatus configured to processgraphics data comprising: a memory; and at least one processorconfigured to: organize a plurality of primitives of a scene in ahierarchical data structure, wherein a plurality of bounding volumes areassociated with nodes of the hierarchical data structure; rasterizerepresentations of each of the plurality of bounding volumes to anoff-screen render target in the memory, wherein each representation ofthe representations of each of the plurality of bounding volumes isassociated with a different color value of a plurality of color values;determine, based at least in part on a color value of the plurality ofcolor values of a pixel of a representation of a respective one of thebounding volumes in the off-screen render target that maps to a ray inthe scene, a non-root node of the hierarchical data structure associatedwith the respective one of the bounding volumes as a start node to starttraversal of the hierarchical data structure; and traverse a set ofnodes of the hierarchical data structure starting from the start node todetermine whether the ray in the scene intersects one of the pluralityof primitives.
 10. The apparatus of claim 9, wherein the at least oneprocessor is further configured to: associate a different one of aplurality of color values with each of the plurality of nodes of thehierarchical data structure; and for each bounding volume of theplurality of bounding volumes, rasterize a block of pixels having one ofthe different color values associated with one of the nodes of thehierarchical data structure that is associated with the respectivebounding volume.
 11. The apparatus of claim 10, wherein the at least oneprocessor is further configured to: determine one of the nodes of thehierarchical data structure that is associated with the color value ofthe pixel; and set the node of the hierarchical data structure that isassociated with the color value of the pixel as the start node to starttraversal of the hierarchical data structure.
 12. The apparatus of claim10, wherein the at least one processor is further configured to:determine that a first representation of a first bounding volume of thebounding volumes overlaps a second representation of a second boundingvolume of the bounding volumes, wherein the first bounding volume isassociated with a first node of the nodes of the hierarchical datastructure and the second bounding volume is associated with a secondnode of the nodes of the hierarchical data structure; and set colorvalues of one or more pixels in a region of overlap of the firstrepresentation and the second representation to a node color valueassociated with a lowest common ancestor node of the first node and thesecond node.
 13. The apparatus of claim 9, wherein the at least oneprocessor is further configured to: responsive to determining that theray intersects one of the plurality of primitives, determine that alocation in the scene from which the ray emanates towards a light sourceis occluded from the light source, wherein the ray comprises a shadowray.
 14. The apparatus of claim 9, wherein the at least one processor isfurther configured to: rasterize the representation of each of theplurality of bounding volumes to the off-screen render target from apoint of view of a light source.
 15. The apparatus of claim 9, whereinthe at least one processor is further configured to: rasterize aplurality of flat-shaded cubes to the off-screen render target as therepresentations of each of the plurality of bounding volumes; and scaleand translate each of the plurality of flat-shaded cubes to match ashape of a respective bounding volume.
 16. The apparatus of claim 9,wherein the apparatus further includes a display device, and wherein theat least one processor is further configured to: render the scene fordisplay by the display device.
 17. An apparatus comprising: means fororganizing a plurality of primitives of a scene in a hierarchical datastructure, wherein a plurality of bounding volumes are associated withnodes of the hierarchical data structure; means for rasterizingrepresentations of each of the plurality of bounding volumes to anoff-screen render target, wherein each representation of therepresentations of each of the plurality of bounding volumes isassociated with a different color value of a plurality of color values;means for determining, based at least in part on a color value of theplurality of color values of a pixel of a representation of a respectiveone of the bounding volumes in the off-screen render target maps to aray in the scene, a non-root node of the hierarchical data structureassociated with the respective one of the bounding volumes as a startnode to start traversal of the hierarchical data structure; and meansfor traversing a set of nodes of the hierarchical data structurestarting from the start node to determine whether the ray in the sceneintersects one of the plurality of primitives.
 18. The apparatus ofclaim 17, wherein the means for rasterizing the representation of eachof the plurality of bounding volumes to the off-screen render targetfurther comprises: means for associating a different one of a pluralityof color values with each of the plurality of nodes of the hierarchicaldata structure; and means for rasterizing, for each bounding volume ofthe plurality of bounding volumes, a block of pixels having one of thedifferent color values associated with one of the nodes of thehierarchical data structure that is associated with the respectivebounding volume.
 19. The apparatus of claim 18, wherein the means fordetermining the non-root node of the hierarchical data structureassociated with the bounding volume as the start node to start traversalof the hierarchical data structure further comprises: means fordetermining one of the nodes of the hierarchical data structure that isassociated with the color value of the pixel; and means for setting thenode of the hierarchical data structure that is associated with thecolor value of the pixel as the start node to start traversal of thehierarchical data structure.
 20. The apparatus of claim 18, furthercomprising: means for determining that a first representation of a firstbounding volume of the bounding volumes overlaps a second representationof a second bounding volume of the bounding volumes, wherein the firstbounding volume is associated with a first node of the nodes of thehierarchical data structure and the second bounding volume is associatedwith a second node of the nodes of the hierarchical data structure; andmeans for setting color values of one or more pixels in a region ofoverlap of the first representation and the second representation to anode color value associated with a lowest common ancestor node of thefirst node and the second node.
 21. The apparatus of claim 17, furthercomprising: means for determining, responsive to determining that theray intersects one of the plurality of primitives, that a location inthe scene from which the ray emanates towards a light source is occludedfrom the light source, wherein the ray comprises a shadow ray.
 22. Theapparatus of claim 17, wherein the means for rasterizing therepresentation of each of the plurality of bounding volumes to theoff-screen render target further comprises: means for rasterizing therepresentation of each of the plurality of bounding volumes to theoff-screen render target from a point of view of a light source.
 23. Theapparatus of claim 17, wherein the means for rasterizing therepresentation of each of the plurality of bounding volumes to theoff-screen render target further comprises: means for rasterizing aplurality of flat-shaded cubes to the off-screen render target as therepresentations of each of the plurality of bounding volumes; and meansfor scaling and translating each of the plurality of flat-shaded cubesto match a shape of a respective bounding volume.
 24. A non-transitorycomputer-readable storage medium storing instructions that, whenexecuted, cause one or more programmable processors to: organize aplurality of primitives of a scene in a hierarchical data structure,wherein a plurality of bounding volumes are associated with nodes of thehierarchical data structure; rasterize representations of each of theplurality of bounding volumes to an off-screen render target in thememory, wherein each representation of the representations of each ofthe plurality of bounding volumes is associated with a different colorvalue of a plurality of color values; determine, based at least in parton a color value of the plurality of color values of a pixel of arepresentation of a respective one of the bounding volumes in theoff-screen render target maps to a ray in the scene, a non-root node ofthe hierarchical data structure associated with the respective one ofthe bounding volume as a start node to start traversal of thehierarchical data structure; and traverse a set of nodes of thehierarchical data structure starting from the start node to determinewhether the ray in the scene intersects one of the plurality ofprimitives.
 25. The non-transitory computer-readable storage medium ofclaim 24, wherein rasterize the representation of each of the pluralityof bounding volumes to the off-screen render target further comprises:associate a different one of a plurality of color values with each ofthe plurality of nodes of the hierarchical data structure; and for eachbounding volume of the plurality of bounding volumes, rasterize a blockof pixels having one of the different color values associated with oneof the nodes of the hierarchical data structure that is associated withthe respective bounding volume.
 26. The non-transitory computer-readablestorage medium of claim 25, wherein determine the non-root node of thehierarchical data structure associated with the bounding volume as thestart node to start traversal of the hierarchical data structure furthercomprises: determine one of the nodes of the hierarchical data structurethat is associated with a color value of the pixel; and set the node ofthe hierarchical data structure that is associated with the color valueof the pixel as the start node to start traversal of the hierarchicaldata structure.
 27. The non-transitory computer-readable storage mediumof claim 25, further comprising instructions that, when executed, causeone or more programmable processors to: determine that a firstrepresentation of a first bounding volume of the bounding volumesoverlaps a second representation of a second bounding volume of thebounding volumes, wherein the first bounding volume is associated with afirst node of the nodes of the hierarchical data structure and thesecond bounding volume is associated with a second node of the nodes ofthe hierarchical data structure; and set color values of pixels in aregion of overlap of the first representation and the secondrepresentation to the node color value associated with a lowest commonancestor node of the first node and the second node.
 28. Thenon-transitory computer-readable storage medium of claim 24, furthercomprising instructions that, when executed, cause one or moreprogrammable processors to: responsive to determining that the rayintersects one of the plurality of primitives, determine that a locationin the scene from which the ray emanates towards a light source isoccluded from the light source, wherein the ray comprises a shadow ray.29. The non-transitory computer-readable storage medium of claim 24,wherein rasterize the representation of each of the plurality ofbounding volumes to the off-screen render target further comprises:rasterize the representation of each of the plurality of boundingvolumes to the off-screen render target from a point of view of a lightsource.
 30. The non-transitory computer-readable storage medium of claim24, wherein rasterize the representation of each of the plurality ofbounding volumes to the off-screen render target further comprises:rasterize a plurality of flat-shaded cubes to the off-screen rendertarget as the representations of each of the plurality of boundingvolumes; and scale and translate each of the plurality of flat-shadedcubes to match a shape of a respective bounding volume.